Using classified text and deep learning algorithms to assess risk and provide early warning

ABSTRACT

Deep learning is used to identify specific risks to an enterprise of a pending litigation and identify documents of interest for the litigation. The system involves mining and using existing classifications of data (e.g., from a litigation database) to train one or more deep learning algorithms, and then examining the electronically stored information with the trained algorithm, to generate a scored output that will enable enterprise personnel to review risks to the enterprise, e.g. to enable enterprise personnel to assess the nature and extent of the potential damage from the litigation, and to identify relevant documents that would be saved to prevent spoliation.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation in part of PCT Application No.PCT/US2017/50555, filed, Sep. 7, 2017, which claims the benefit of U.S.Provisional Application Ser. No. 62/357,803, filed on Jul. 1, 2016, allof which are herein incorporated by reference for completeness ofdisclosure.

BACKGROUND OF THE INVENTION Field of the Invention

This invention relates generally to machine learning and morespecifically to training deep learning algorithms as text classifiersand using them to assess a risk that has been realized in a recentlyfiled lawsuit and identify relevant documents to prevent spoliation.

Description of the Related Art

Law professor Louis M. Brown (1909-1996) advocated “preventive law.”Indeed, he pioneered this concept. His philosophy was this: “The time tosee an attorney is when you're legally healthy—certainly before theadvent of litigation, and prior to the time legal trouble occurs.” Helikened his approach to preventive medicine. However, Prof. Brown passedaway before computer hardware and software had reached the point wherehis concept could be implemented. There are no conferences or journalstoday which focus on preventive law.

In modern society, entities such as commercial businesses,not-for-profit organizations, governmental agencies, and other ongoingconcerns (all hereinafter referred to collectively as “enterprises”) areexposed to potential liabilities if they breach contractual, criminal,governmental, or tort obligations.

In Preventing Litigation: An Early Warning System, Etc. (Business ExpertPress 2015) (“Preventing Litigation”), I presented the data showing thatthe average annual cost of commercial tort litigation in terms ofpayouts, defense attorneys' fees, and administrative expenses(collectively, “cost”), during the 10-year period from 2001 through2010, was $160 billion. The total cost for that 10-year timeframe was$1.6 trillion. That pain is enormous.

In Preventing Litigation, I compiled the federal and state caseload forthat same 10-year period, and computed the cost per case. The result was$408,000, but I concluded that the cost per case was better set at$350,000, as a minimum.

Since litigation is neither a cost of goods sold nor a cost of servicesprovided, this result indicates a loss to the enterprise of net gains ofover $1 million for only three average commercial tort litigationmatters, but it was not surprising. It is common knowledge that the costof litigation is high. On occasion, employee misbehavior, at everylevel, has violated the rights of another employee, severely impaired anenterprise, harmed an entire marketplace, or physically harmedenterprise employees, members of the public, or violated their rights.However, I assert that my data compilation and calculation was the first“per case” derivation of the average cost per case. I showed how much ofa losing proposition it is for an enterprise to have to defend acommercial tort litigation matter, even if the client's attorneys aresuccessful in the defense they mount.

Worse, severe misconduct causing massive financial and/or physical harmmay escalate to the level where criminal charges are filed. Such chargesmay be filed against the enterprise and the individuals responsible forthe harm. In the early 1990s, the Federal Sentencing Guidelines providedbenchmarks for misconduct. The Sentencing Guidelines make room formitigating conduct and actions that speak against the heaviestpenalties. In this context, a system enabling the prevention of harm mayfunction to avoid criminal prosecution altogether. Such a system isevidence of a specific intent to avoid harm, which is the opposite of anelement any prosecutor would be forced by law to meet: a specific intentto do harm.

However, litigation can cost an enterprise in still other ways. Forexample, the enterprise's reputation may suffer, productivity may bereduced, as when an executive or technology employee receives alitigation hold notice and must divert his or her attention from thematters at hand; meets with in-house or outside counsel; or prepares forand then sits for a deposition or testifies in court.

These high costs and risks are sufficient motivation to find a way toidentify the risks of litigation before the damage is done. If a riskcan be identified and eliminated before causing damage, the risk cannotgive rise to a lawsuit. No civil lawsuit is viable without a good faithallegation of the necessary element of damages.

The attorneys who are closest to the data internal to an enterprise arethe attorneys employed by the enterprise. However, these in-houseattorneys are blind to the data which contain indications of litigationrisks.

There is no software technology or product extant today which permitsenterprise employees to identify and surface examples of the risks ofbeing sued while they are still only potential legal liabilities.

Thus, there is a need for a system capable of identifying anenterprise's own internal risks, including but not limited to the riskof litigation, and providing early warning to appropriate personnel.

BRIEF SUMMARY OF THE INVENTION

This invention comprises a computer-enabled software system using deeplearning to assess specific risks of a litigation by evaluatingelectronically stored information (“ESI”). ESIs may be emails and anyattachments thereto, a collection of call center notes, a set ofwarranty claims, other internal company documents, or transcriptions ofvoice mail messages. One or more embodiments of the invention relies onexisting classifications of litigation data to train one or more deeplearning algorithms, and then to examine ESIs with them, to identifyrelevant data and assess risk of a newly filed lawsuit. A computer-basedexamination of ESIs could be near real-time, e.g., overnight. After all,the purpose of an early warning system is to enable the enterprise to beproactive instead of reactive, and as soon as possible.

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other aspects, features and advantages of the inventionwill be more apparent from the following more particular descriptionthereof, presented in conjunction with the following drawings wherein:

FIG. 1 is a flow chart illustration of the process for using classifiedtext and deep learning algorithms to identify risk and provide earlywarning in accordance with one or more embodiments of the presentinvention.

FIG. 2 illustrates a general-purpose computer and peripherals that whenprogrammed as described herein may operate as a specially programmedcomputer capable of implementing one or more methods, apparatus and/orsystems of the present invention.

FIG. 3 is a bar graph illustration of email score frequencies above 0.80for 400 training documents.

FIG. 4 is a bar graph illustration of email score frequencies aftertraining to find employment discrimination risks.

FIG. 5 is a graph of Receiver Operating Characteristic (ROC) and relatedArea Under the Curve (AUC).

DETAILED DESCRIPTION

The present invention comprising using classified text and deep learningalgorithms to identify risk and provide early warning will now bedescribed. In the following exemplary description numerous specificdetails are set forth in order to provide a more thorough understandingof embodiments of the invention. It will be apparent, however, to anartisan of ordinary skill that the present invention may be practicedwithout incorporating all aspects of the specific details describedherein. Furthermore, although steps or processes are set forth in anexemplary order to provide an understanding of one or more systems andmethods, the exemplary order is not meant to be limiting. One ofordinary skill in the art would recognize that one or more steps orprocesses may be performed simultaneously or in multiple process flowswithout departing from the spirit or the scope of the invention. Inother instances, specific features, quantities, or measurements wellknown to those of ordinary skill in the art have not been described indetail so as not to obscure the invention. It should be noted thatalthough examples of the invention are set forth herein, the claims, andthe full scope of any equivalents, are what define the metes and boundsof the invention.

For a better understanding of the disclosed embodiment, its operatingadvantages, and the specified object attained by its uses, referenceshould be made to the accompanying drawings and descriptive matter inwhich there are illustrated exemplary disclosed embodiments. Thedisclosed embodiments are not intended to be limited to the specificforms set forth herein. It is understood that various omissions andsubstitutions of equivalents are contemplated as circumstances maysuggest or render expedient, but these are intended to cover theapplication or implementation.

The term “first”, “second” and the like, herein do not denote any order,quantity or importance, but rather are used to distinguish one elementfrom another, and the terms “a” and “an” herein do not denote alimitation of quantity, but rather denote the presence of at least oneof the referenced item. The terms “email” and “Email” both refer to anemail and any attachment.

The term “algorithm” refers to a “deep learning algorithm,” “deeplearning neural network,” or a “deep learning model,” all of which hererefer to a form of text classification.

One or more embodiments of the present invention will now be describedwith references to FIGS. 1-5.

FIG. 1 is a flow chart illustration of the process 100 for usingclassified text and deep learning algorithms to identify risk andprovide early warning. As illustrated, the process 100 begins at block102 with mining of data for training one or more deep learningalgorithms. In the typical instance, subject matter experts identify oneor more datasets with classifications of risk or threats having asufficient number of textual documents. These classifications (orcategories or labels) of risk are more typically from sources outside ofthe enterprise. The system data-mines such classified datasets toextract a sufficient number of documents within a specific category totrain one or more deep learning algorithms.

In the context of litigation risk, for example, a subject matter expertwould note that the federal court litigation database known as thePublic Access to Court Electronic Records (PACER) is based on well overone hundred case types, to which PACER assigns Nature of Suit (NOS)codes. When a federal court lawsuit is initiated, the person filing itmust complete and file a form called a Civil Cover Sheet, whichinstructs the person responsible for filing the lawsuit to review a listof NOS codes and choose one and only one code which best describes thelawsuit, even if there is more than one theory of recovery.

To create a set of training documents for a particular federal courtlitigation risk, a user of this invention would use PACER's ApplicationProgramming Interface (API) to obtain hundreds if not thousands of textdocuments from previous lawsuits which have been filed in a specific NOScategory. Such a user would then compile a large number of trainingdocuments which describe the facts (not the law) which prompted thelawsuit to be filed in the first place.

In this illustrative example, PACER would be a generic source ofclassified text outside of the enterprise, which is as training data. Aready (but not the only) source of training data is a lawsuit complaintin a specific case-type category, as identified by its NOS category nameor number.

NOS categories are not difficult to understand. One of them, forexample, is the category of Civil Rights-Employment which, in otherwords, means illegal discrimination against employees, whether it is forage, race, sex, or some other subclass of illegal discrimination. Inaddition, and among over one hundred categories, there are codes forbreach of contract, fraud and product liability case-types.

There may be other sources of training data, i.e. internal enterprisesources of specific litigation—case-type training data. Some examplesare: textual data from the previous litigation history of an enterprise;text in warranty claims; and data from the confirmation by a user that aspecific system output document (e.g., email) has been scored by thealgorithm in a way indicating that it should be saved and used forre-training the algorithm.

Using litigation case-type data, a complaint is usually (but not always)identified as the first document in a litigation docket, i.e. DocumentNumber 1. In order to train a deep learning algorithm, the focus is onthe factual allegations in these complaints. These factual allegationsmay be stated in a section of a complaint entitled “Background Facts” orthe like.

Many sections of a complaint are unnecessary and consist of legalisticstatements that can be deleted for the purpose of training thealgorithm. For example, the sections that pertain to jurisdiction,venue, the identification of the parties, the legal theories forrecovery, and the prayer for damages are unnecessary. By deletingunnecessary text, the amount of training data is reduced, and thetraining data will contain less “noise” for analysis by the deeplearning algorithm.

In some cases, the plaintiff in a case may be represented by anattorney. In that case, the fact section is based on informationprovided by the attorney's client and by the information stemming fromthe attorney's research. Because attorneys typically present facts in alogical way, so as to be both understood and persuasive, we assume thatsuch facts have been vetted. For this reason, complaints written bycounsel are a prime source of training data.

However, additional facts may be developed after the complaint is filedand during the discovery of electronically stored information, e.g., byway of production of documents, responses to written interrogatories, orby testimony given during depositions and the like. Although such factsare generally not placed in the public record, certain key facts are putinto the public record (e.g. in PACER) as part of motions for summaryjudgment.

The data mining needed to create a strong deep learning algorithm aimsat surfacing a large number of factual allegations within a specificrisk case-type.

In one or more embodiments, the system's primary, but not only, sourceof training data consists of the facts alleged in previously filedcomplaints filed in a specific category of lawsuit.

Such litigation data is positive training data, and typically containsno emails. The risks the system of the present invention would seek tosurface test data that would be “related” to the aggregation of thesepositive facts. The degree of the relation is reported by an accuracyscore ranging from 0.50 to a maximum of 1.0. The training data alsoincludes negative training data, such as text concerning some unrelatedtopic, e.g., the Holy Roman Empire. (Negative, unrelated training datamay be obtained from Wikipedia, for example.) The system uses negativetraining data to better score test data as either related or unrelatedto a specific case-type.

The training data is crucial for a deep learning engine to be able toproduce an accuracy score for the text in the test data, which typicallyconsists primarily of emails. The algorithm can produce an accuracyscore by comparing an email, as encoded, to the vector space describedby the positive training data related to the risk, and to the negativetraining data, which is unrelated.

The process of aggregating this training material and providing it to adeep learning engine involves creating a “vector” for each word in theblock in relation to the two or three words before and after it.Accordingly, each word vector has its own context, and that context ismeaningful in connection with the type of case (and the type of risk)for which the deep learning algorithm is being trained. Transforming thetext used in a specific classification (or, for litigation, specificcategory or type of case) into numerical vectors may be accomplished viavarious methods such as Word2vec by Tomas Mikolov at Google, “GloVe:Global Vectors for Word Representation” by Jeffrey Pennington, et al.,etc.

However, to make the matter clear, although the deep learning algorithmwill encode the text in the above-described manner, i.e. words withinthe context of other words, the factual allegations are not provided tothe algorithm word by word, sentence by sentence, or paragraph byparagraph. Instead, the whole block of factual allegations is presentedfor ingestion as a document.

One object of amassing a sufficient number (hundreds if not thousands)of training documents is to train a deep learning algorithm so that itfunctions well, and so is considered “strong.” Consequently, at step104, category-specific training documents are passed to, and ingestedby, one or more deep learning algorithms best suited to handle naturallanguage processing (NLP). The algorithm more commonly used in thecontext of NLP and text analysis, is known to practitioners in the artas a recurrent neural networks (RNN).

Such deep learning RNNs use hidden computational “nodes” and various“gates,” and require manipulation known in the art as “tuning.” Afterthe process of “tuning”, the algorithm will be evaluated to assess thedegree to which it accurately identifies the textual test data it hasnever before encountered with the “vector space” it has been trained torecognize. As one example, practitioners construct a Receiver OperatingCharacteristic (ROC) graph and calculate the Area Under the Curve (AUC)score. An ROC graph measures true positives (on the y-axis) versus falsepositives (on the x-axis). Because the maximum AUC score is one (1.0),an ROC-AUC score in the mid-nineties, e.g., 0.95, indicates that thereare far more true positives than false positives. In experiments,described below, the RNNs of the present invention have achieved scoresabove 0.967.

When an algorithm is trained to “understand” a particular type of case,it may be thought of as a “filter.” Typically, the system will consistof more than one filter. The system passes the enterprise data througheach filter. The reason is clear: A deep learning algorithm trained toidentify “breach of contract” risks, which we now call a filter, mayfind no risk in the test data, but an “employment discrimination” filtermay find one or more high-scoring emails in the same data.

Once the Deep Learning Engine is trained, at step 106 the system indexesand also extracts text from each Email and in any attachment. Those ofskill in the art will appreciate that other implementations arecontemplated. For example, it may be necessary or desirable to onlyextract the subject line and the unstructured text in the message field.In addition, the indexed data may be stored for a specific period oftime in a database, a period which the Enterprise may designate inaccordance with its data destruction policies.

In one or more embodiments, the system at step 106 may operate in anon-real-time mode by extracting existing internal email data, e.g. fromthe previous day's Email, and then stores the data in a database. Inother embodiments, the system at step 106 may operate in real-time tointercept, index, store and extract text from internal Email data.

After indexing, extracting text, and storing the internal Email data ina database, the system passes that data to each of the category-specificalgorithms at step 108, which are also referred to herein as “filters.”Each filter scores the data for each Email for accuracy in comparison tohow each filter was trained.

Once each Email is scored for accuracy in relation to the risks orthreats by one of the filters, the score and text are output at step110. The Emails related to a particular risk may be reported as an“early warning” alert to specific employees, for example. Theseemployees may be a pre-determined list of in-house attorneys, paralegalsor other employees in the legal department.

In addition, an enterprise may configure the system to send its “earlywarning” alert to a list of devices owned by the enterprise where thecommunication of the output may be encrypted using appropriate securitymeasures.

When a scored Email is reported to a designated enterprise employee,that employee may be enabled to review the Email in one or more ofseveral modes, e.g., the scored text in a spreadsheet format and/or abar graph distribution. Using the spreadsheet format, after reviewing arow of score and Email text, a user may call that Email to the fore andreview it in its native state. This feature is possible because theEmails were indexed when they were copied into the system.

At step 112, when a determination is made, e.g. by a reviewer such as anattorney or paralegal, that a specific Email is, at least provisionally,a false positive (and that further investigation is not warranted), theprocess proceeds to step 118 where the email is stored in a FalsePositive database. A user interface, e.g. graphical, may be provided forthe reviewer to perform the necessary designations, for example. Thoseof skill in the art would appreciate that the system could be configuredto perform this determination step automatically, with the reviewerhaving a veto or override capability, for example.

However, if at step 112 a determination is made that an identified Emailis a true positive, a copy of that Email may be placed in a TruePositive database at step 114. When a designated number of Emails havebeen saved for either purpose, the positive or negative training datamay be updated. In this way, the generic training data may be augmentedwith company-specific training data. With this additional training data,the deep learning algorithms may be re-trained in steps 120 and/or 122to amplify the positive or negative vector spaces for each filter, andto better reflect the enterprise's experience and culture.

Email marked as true positive and placed in True Positive database atstep 114 may be exported via an API to the enterprise's existinglitigation or investigation case management system at step 116, if any,from which the risk may be optionally addressed. The algorithm'spositive output may be limited to scores which surpass a user-specifiedthreshold, for example.

The system of the invention we have now described has two additionaladvantages. The first advantage is confidentiality. If the enterprisedirects its legal department to install the system and have itsattorneys direct and control its operation, including any involvement bythe IT department, then the attorneys using the system may invoke theattorney work-product doctrine. Then, when the system provides an outputto a designated list of attorneys or other legal department personnel,such as paralegals, the enterprise may again invoke the attorney-workproduct doctrine when someone in the legal department decides whichemails to investigate. Similarly, the work-product doctrine should applywhen legal department personnel use an API to access and use whatevercase or investigation management platform the enterprise uses.

In addition, when an investigation appears to warrant further action ofa proactive, preventive nature, the enterprise attorneys may advise acontrol group executive in order to invoke the attorney-clientprivilege.

Thus, by installing and operating the system in the manner describedabove, the invention provides confidentiality to the sensitiveinformation that is being brought to light.

The second advantage arises whenever a regulatory investigation becomesproblematic. Should a governmental entity file criminal charges againstthe enterprise and or any of its personnel, the prosecuting authoritieswill have to present evidence of a specific intent to do harm. But byinstalling and operating the system of this invention in good faith, theenterprise and anyone so charged will have countervailing evidence of aspecific intent to avoid harm.

To summarize: Once the deep learning algorithm is trained, the systemhas three major subsystems, enterprise data, the deep learningalgorithms or filter(s), and the output data. Taken together, the systemoperates to identify a potentially adverse risk to the enterprise andprovide early warning to a user. In the exemplary embodiments providedherein, the potentially adverse risk is the risk of a specific type oflitigation but could just as well be other types of risk, including therisk of physical harm to the enterprise's customers by the enterprise'sproducts.

FIG. 2 diagrams a general-purpose computer and peripherals 200, whenprogrammed as described herein, may operate as a specially programmedcomputer capable of implementing one or more methods, apparatus and/orsystems of the solution described in this disclosure. Processor 207 maybe coupled to bi-directional communication infrastructure 202 such ascommunication infrastructure system bus 202. Communicationinfrastructure 202 may generally be a system bus that provides aninterface to the other components in the general-purpose computer systemsuch as processor 207, main memory 206, display interface 208, secondarymemory 212 and/or communication interface 224.

Main memory 206 may provide a computer readable medium for accessing andexecuted stored data and applications. Display interface 208 maycommunicate with display unit 210 that may be utilized to displayoutputs to the user of the specially-programmed computer system. Displayunit 210 may comprise one or more monitors that may visually depictaspects of the computer program to the user. Main memory 206 and displayinterface 208 may be coupled to communication infrastructure 202, whichmay serve as the interface point to secondary memory 212 andcommunication interface 224. Secondary memory 212 may provide additionalmemory resources beyond main memory 206, and may generally function as astorage location for computer programs to be executed by processor 207.Either fixed or removable computer-readable media may serve as Secondarymemory 212. Secondary memory 212 may comprise, for example, hard disk214 and removable storage drive 216 that may have an associatedremovable storage unit 218. There may be multiple sources of secondarymemory 212 and systems implementing the solutions described in thisdisclosure may be configured as needed to support the data storagerequirements of the user and the methods described herein. Secondarymemory 212 may also comprise interface 220 that serves as an interfacepoint to additional storage such as removable storage unit 222. Numeroustypes of data storage devices may serve as repositories for datautilized by the specially programmed computer system. For example,magnetic, optical or magnetic-optical storage systems, or any otheravailable mass storage technology that provides a repository for digitalinformation may be used.

Communication interface 224 may be coupled to communicationinfrastructure 202 and may serve as a conduit for data destined for orreceived from communication path 226. A network interface card (NIC) isan example of the type of device that once coupled to communicationinfrastructure 202 may provide a mechanism for transporting data tocommunication path 226. Computer networks such Local Area Networks(LAN), Wide Area Networks (WAN), Wireless networks, optical networks,distributed networks, the Internet or any combination thereof are someexamples of the type of communication paths that may be utilized by thespecially program computer system. Communication path 226 may compriseany type of telecommunication network or interconnection fabric that cantransport data to and from communication interface 224.

To facilitate user interaction with the specially programmed computersystem, one or more human interface devices (HID) 230 may be provided.Some examples of HIDs that enable users to input commands or data to thespecially programmed computer may comprise a keyboard, mouse, touchscreen devices, microphones or other audio interface devices, motionsensors or the like, as well as any other device able to accept any kindof human input and in turn communicate that input to processor 207 totrigger one or more responses from the specially programmed computer arewithin the scope of the system disclosed herein.

While FIG. 2 depicts a physical device, the scope of the system may alsoencompass a virtual device, virtual machine or simulator embodied in oneor more computer programs executing on a computer or computer system andacting or providing a computer system environment compatible with themethods and processes of this disclosure. In one or more embodiments,the system may also encompass a cloud computing system or any othersystem where shared resources, such as hardware, applications, data, orany other resource are made available on demand over the Internet or anyother network. In one or more embodiments, the system may also encompassparallel systems, multi-processor systems, multi-core processors, and/orany combination thereof. Where a virtual machine, process, device orotherwise performs substantially similarly to that of a physicalcomputer system, such a virtual platform will also fall within the scopeof disclosure provided herein, notwithstanding the description herein ofa physical system such as that in FIG. 2.

Experimental Results

Embodiments of the present invention were validated using training datafor the employment discrimination case-type, two similar (but different)deep learning algorithm providers, and a portion of Ken Lay's Enronemail corpus. The system found one (1) risky email out of 7,665 emails.

The system as described herein requires multiple types of factualinformation. For example, the factual information may include but is notlimited to a compilation of factual allegations previously presented aspre-litigation demands; a compilation of factual allegations previouslypresented as part of filed lawsuits; factual details extracted fromhypothetical examples of potential legal liability as identified andpreserved by authorized personnel; factual details extracted fromlearned treatises; factual details from employee complaints; and factualdetails from customer complaints.

First, text data from prior court cases (and other sources) pertainingto employment discrimination lawsuits were extracted as an example ofmany case-types. Second, the text data was used to train two deeplearning algorithms in two ways, with documents that were related toprior employment discrimination lawsuits, and with documents that wereclearly not related to an employment discrimination risk. There were noemails in the training data.

Next, as trained, the deep learning algorithms were presented with testdata consisting of a portion of the Enron email subset for Ken Lay, theformer Chairman and CEO of Enron. The test data consisted of only theseemails.

Before the experiments, the PACER database was reviewed for statisticsabout Enron. For the five-year period 1997-2001, the chances of findinga workplace discrimination case against an Enron company was only aboutone percent (1%). During that five-year timeframe, an Enron company wasnamed in litigation 1,339 times, and was named in an employmentdiscrimination case only 13 times. Accordingly, there was no expectationof a significant result because it is unlikely that employees with adiscrimination complaint would reach out to Ken Lay. Ken Lay was, afterall, the Chairman and CEO of Enron, not a manager and not the directorof Human Resources.

Next, PACER was data-mined to extract certain text portions of documentsfiled in the employment discrimination category to create a set oftraining documents in this silo.

The first experiment was with a deep learning algorithm provided byMetaMind, which was later acquired by Salesforce. The amount of trainingdata was increased in baby steps. The first experiment used only 50training documents, but provided immediate results, which was surprisingand unexpected, in part because Ken Lay was the Chairman and CEO ofEnron, not the director of the Human Resources department.

As configured for the experiment, the system reported the results in twoformats. The first format is an Excel spreadsheet. There, in the firstrow, in Column A, the system shows the scores which indicate theaccuracy of the Email text compared to the case-type for which thealgorithm was trained. In Column B, the system shows a portion of theEmail text associated with the score. Twenty-two (22) emails were foundwhich scored at 0.90 or above for accuracy out of 6,352 emails, and twoof them expressed a risk of employment discrimination, with one being a“forward” of the other.

The second format is a bar graph of the data scored by the algorithm,illustrated in FIG. 3. The bar graph is a distribution which providescontext for the highest scoring emails. On the x-axis, the bar graphshows the scores for the emails. The highest possible score is 1.0 andis on the far right. On the y-axis, the graph shows the number of emailswhich received any particular score. The distribution bar graph onlyshows the scores ranging from 0.80 to 1.00.

In reviewing the top-scoring 22 emails, i.e. the ones which scored 0.90or above, the data showed that most of them were false positives, buttwo emails (as noted above) stood out. Scoring at 0.94, both of thempresented a discrimination risk, but it was the same risk, because oneemail was a “forward” of the initial version. The subject of that emailwas “[M]y unfair treatment at Enron—Please HELP.”

After further training, to 400 documents, the number of false positiveswas reduced. The deep learning algorithm scored only four (4) emails at0.86 or higher.

In the resulting spreadsheet, lines 3 and 4 scored at the 0.86 and 0.88levels respectively. Those emails include the phrase “my unfairtreatment at Enron.” Upon further review, the first paragraph of theemail in the spreadsheet began: “Dear Mr. Lay, [M]y employment withEnron is to be terminated, the reason given by HR, for not meetingperformance standards. However, I firmly believe that this is not thereal reason, the real one being for defying the wishes of my manager . .. , who, I believe was acting in a discriminatory way towards me . . . ”(Boldface and italics added.)

As the number of our training documents increased, it became evidentthat the deep learning algorithm was becoming more accurate. Inaddition, the inclusion of a negative dataset (and vector space), e.g.pertaining to the Holy Roman Empire and calendar entries, also reducedthe number of false positives in the results.

Further experiments with a list of sex and race terms demonstrated thatthey added little to the strength of the algorithm if anything. This maybe because the lists lacked any context and were as insufficient as anylist of key words.

In the experiment pertaining to the spreadsheet where four high-scoringEmails were identified, about 400 training documents were used. Aspreviously mentioned, only four emails scored at or above 0.80 foraccuracy with respect to the training data. Two emails scored above 0.90for accuracy, while the other two scored 0.88 and 0.86. In a subsequentexperiment using 7,665 Ken Lay emails, the distribution bar graph forthis experiment is illustrated in FIG. 4. The y-axis runs from 0 to only12, indicating that the deep learning algorithm was now much morefocused.

The experiments also showed that a deep learning algorithm, however welltrained, will nevertheless generate an alert that a reviewer wouldreject for follow up. For example, the Email scoring at 0.97 was from anEnron employee in India. In part, it read: “Subsequently, I was forcedupon a cheque of severance amount on 27 Aug. 2001 which I received underprotest and the notice of such protest was served by me on firm in theinterest of justice. I submit such a termination is illegal, bad in lawand void ab-initio and accordingly when such an action was not correctedby firm, I [was] constrained to approach the Court of law.” (Boldfaceand italics added.)

Thus, while that Email recounts a discrimination risk, the risk appearsto have already become a lawsuit. A reviewing attorney might wellconsider this high-scoring email to be a false positive, especially ifhe or she determines that a lawsuit has already been filed.

A second experiment used Indico Data Systems, Inc. (“Indico”) DeepLearning Algorithm to validate the previous training with MetaMind.Indico was provided with the same training data used with MetaMind andwith the same test data.

The results showed that the same risky Email found using MetaMind thathad text in the subject line stating, “unfair treatment at Enron,” which0.86 and 0.88 with MetaMind, was flagged Indico's model, and the sameEmail scored 0.89, which is comparable.

Indico also used about 75 held-out documents in order to provide a graphof a curve showing a Receiver Operating Characteristic, illustrated inFIG. 5, which included a related indicator called the Area Under theCurve (AUC). These statistics put “true positives” on the y-axis and“false positives” on the x-axis. On both axes, the maximum score is 1.0.A strong algorithm will score high and to the left, which is what wesaw.

Also since the maximum score for each axis is 1.0, the area under theROC curve is also 1.0. According to the second provider, Indico, the AUCscore was 0.967 (see the AUC score in the lower right hand corner),which means that the algorithm, as trained, is strong.

In these experiments, the early warning system, having been trained (asan example) to detect an employment discrimination risk, found one (1)email out of 7,665 emails which signaled just such a risk. Thus, theinventive concept of the system has been tested using a specificlitigation category, the same training data, two different algorithms,and Enron test data, and has functioned in accordance with the inventiveconcept. The system, as trained, had found a needle in a haystack.

Furthermore, the early warning system would function for an authorizedenterprise employee in an unobtrusive way. Enterprise personnel need notbe involved in data-mining to train the Deep Learning system, and thealgorithm itself would scan internal emails and would run in thebackground. The system would not require anyone's attention until thesystem had a reportable result, at a threshold the legal departmentcould set.

At that point, the system would output an alert to a privileged list ofin-house personnel, likely in-house counsel or employees under theirdirection and control, and they would be able to see a spreadsheet witha score in column A, the related text in column B, along with a barchart for context, and then would be enabled to call forward the Emailsof interest.

The experiments discussed above employed deep learning algorithms whichare described in the academic literature as Recursive Neural TensorNetworks or as Recurrent Neural Networks (RNNs) with either LongShort-Term Memory (LSTM) or Gated Recurrent Units (GRUs). Those of skillin the arts will appreciate that use of other algorithms, includingthose which are now open-sourced, are contemplated.

Risk Assessment and Identification of Relevant Data for a PendingLawsuit.

In one or more embodiments of the present invention, the context changesto a time when a lawsuit complaint has been filed and served and hascome to the attention of an enterprise defendant, whether commercial orgovernmental. Now the enterprise must engage in a host of e-Discoveryprocesses. One of the earliest of these processes is Early CaseAssessment. The purpose is to assess the significance of the lawsuit.

Prior to doing so, however, the enterprise must understand the lawsuit'snature and which of its personnel (employees or otherwise) might possessdocuments, e.g., emails, attachments and other, stand-alone documentssuch as memoranda or reports, that are potentially relevant to thelawsuit's allegations. As such, they may be custodians of potentiallyrelevant evidence which must be preserved for further analysis. For suchpersonnel, the enterprise must provide such custodians with “litigationhold notices.” Such notices are designed to avoid spoliation ofdocuments that may be potentially relevant to the matter as well as toenable the enterprise to collect and aggregate the potentially relevantdocuments. Eventually, such collections will enable the enterprise tocomply with future requests by the opposing party for the discovery ofelectronically stored information (ESI).

In a standard discovery workflow, the aggregation of potentiallyrelevant documents provides the enterprise with a corpus that may beexamined for an early warning of whether the complaint poses a seriousor insignificant threat of high costs or other damages, e.g. to brand orpersonal reputations in addition to defense attorney fees, a largesettlement or verdict, and, in the worst case, future adverse actions byregulators or the filing of criminal charges by prosecutors. This earlyexamination process is called Early Case Assessment (ECA).

ECA is a post-litigation process that would use embodiments of thepresent invention, especially if a deep learning model has already beenbuilt for the particular litigation risk described by the complaint.

Accordingly, instead of focusing on the set of yesterday's emails inorder to find a risk, e.g., of a specific type of litigation, thepost-litigation variation would focus each model on the corpus of ESIcollected from the custodians of the potentially relevant ESI.

In addition, if the enterprise has had previous experience with aspecific “nature of suit,” the feedback loop may be different. Before arisk-specific model addresses the newly-collected corpus of ESI, themodel may be re-trained on the documents that were tagged in previouscollections as positive for that specific type of litigation.

And, after the system scores the newly assembled corpus of ESI, thedocuments assessed by users as True and False Positives may be used tore-train the model yet again.

In one or more embodiments, after the model is trained, the system isemployed to score the documents in the corpus of ESI.

The system then identifies documents from that corpus of ESI which arerelated to the nature of the lawsuit. The identified documents may beflagged to users in the legal department for ECA purposes, for example.The users may then tag, save and assess the True Positives for casemanagement purposes.

And both True Positives and False Positives may be saved and used forcompany-specific retraining of the model when a new lawsuit of the samecase-type is filed against the enterprise in the future.

While the invention herein disclosed has been described by means ofspecific embodiments and applications thereof, numerous modificationsand variations could be made thereto by those skilled in the art withoutdeparting from the scope of the invention set forth in the claims.

What is claimed is:
 1. A method of using classified text and deeplearning algorithms to assess risk and identify relevant documentscomprising: creating one or more training datasets for textual datacorresponding to a specific risk classification, wherein said riskclassification comprises a nature of a recently filed lawsuit; trainingone or more deep learning algorithms using said one or more trainingdatasets; collecting and extracting a corpus of documents comprisingelectronically stored information stored by an enterprise; applying saidone or more deep learning algorithms to said corpus of documents toidentify and report one or more documents of interest in the said corpusof documents for an early assessment of the potential harm to theenterprise of said lawsuit; determining if said identified one or moredocuments of interest is a false positive or a true positive; andre-training said one or more deep learning algorithms if said identifiedone or more documents of interest is a false positive.
 2. The method ofclaim 1, wherein said one or more deep learning algorithms is aframework for natural language processing of text.
 3. The method ofclaim 1, wherein said one or more deep learning algorithms is arecurrent neural network with a multiplicity of layers and variousfeatures, including but not limited to long short-term memory or gatedrecurrent units.
 4. The method of claim 1, wherein the one or more deeplearning algorithms have been trained with different classifiers usingpreviously classified data sourced and provided by a subject matterexpert to become models for specific threats or risks of interest. 5.The method of claim 1, wherein each one of said one or more deeplearning algorithms has also been trained with one or more datasetsunrelated to the threats or risks of interest.
 6. The method of claim 1,wherein said one or more training datasets is obtained by mining one ormore litigation databases.
 7. A method of using classified text and deeplearning algorithms to identify risk and provide early warningcomprising: creating one or more training datasets by mining one or morelitigation databases for textual data corresponding to a specific threator risk of interest; training one or more deep learning algorithms usingsaid one or more training datasets; collecting and extracting a corpusof documents comprising electronically stored information stored by anenterprise; applying said one or more deep learning algorithms to saidcorpus of documents to identify and report one or more documents ofinterest in the said corpus of documents for an early assessment of thepotential harm to the enterprise of said lawsuit; determining if saididentified one or more documents of interest is a false positive or atrue positive; and re-training said one or more deep learning algorithmsif said identified one or more documents of interest is a falsepositive.
 8. The method of claim 7, wherein each one of said one or moredeep learning algorithms scores the data for accuracy with the deeplearning model classification of the data.
 9. The method of claim 8,wherein said report comprises providing the scores and related data toone or more designated users.
 10. The method of claim 8, wherein thereport may be limited to scores which surpass a specified thresholdassociated with each of said one or more deep learning algorithms. 11.The method of claim 8, wherein the report and the documents of interestare exported to an existing case management system for investigation andreview and possible further action.